IDa-Det: An Information Discrepancy-Aware Distillation for 1-bit Detectors

173

(a) VOC trainval0712

(b) VOC test2007

(c) COCO trainval35k

(d) COCO minival

FIGURE 6.14

The Mahalanobis distance of the gradient in the intermediate neck feature between Res101-

Res18 (gathering on the left) and Res101-BiRes18 (uniformly dispersed) in various datasets.

proposal saliency maps of Res101 and Res18 (blue) is much smaller than that of Res101

and BiRes18 (orange). That is to say, the smaller the distance, the smaller the discrepancy.

Briefly, conventional KD methods show their effectiveness in distilling real-valued detectors,

but seem to be less effective on distilling 1-bit detectors.

We are motivated by the observation above and present an information discrepancy-

aware distillation for 1-bit detectors (IDa-Det) [260]. This can effectively address the infor-

mation discrepancy problem, leading to an efficient distillation process. As shown in Fig.

6.15, we introduce a discrepancy-aware method to select proposal pairs and facilitate dis-

tilling 1-bit detectors, rather than only using object anchor locations of student models or

ground truth as in existing methods [235, 264, 79]. We further introduce a novel entropy dis-

tillation loss to leverage more comprehensive information than conventional loss functions.

By doing so, we achieve a powerful information discrepancy-aware distillation method for

1-bit detectors (IDa-Det).

Real-valued Teacher

1-bit Student

Object Region

False Positive

Missed Detection

Information

discrepancy

Entropy

distillation loss

Proposal distribution

(Channel-wise Gaussian distribution)

߮ሺڄሻ

߮ሺڄሻ

FIGURE 6.15

Overview of the proposed information discrepancy-aware distillation (IDa-Det) framework.

We first select representative proposal pairs based on the information discrepancy. Then we

propose the entropy distillation loss to eliminate the information discrepancy.